Distributed Genetic Algorithm to Big Data Clustering A Novel Distributed Encoding Techniques

نویسندگان

  • Mustafa H. Hajeer
  • Dipankar Dasgupta
چکیده

Clustering algorithms have emerged as a powerful learning tool to accurately analyze the massive amount of data generated by current applications and smart technologies. Precisely, their main objective is to categorize data into clusters such that objects are grouped in the same cluster when they are similar according to specific metrics. There is a wide and diverse body of knowledge in the area of clustering and there has been attempts apply these algorithms and scale it to adopt todays data. However, one major challenge in using clustering algorithms is scalability of such algorithms in a way that faces the challenges and computational cost of clustering big data. In this paper, we are describing a mapping between graph clustering problem and data clustering. Using genetic algorithms and multi-objective optimization as well as distributed graph stores, the proposed algorithm (1) transform big data into Distributed RDF graphs. With (2) a novel distributed encoding techniques. The algorithm (3) scales to deal with big RDF graphs to (4) produce clusters by maximizing graph modularity as a main objective. The results on LUBM generated big data shows the (5) ability to deal with the challenges provided such data and (6) produce comparative results compared to other peers of clustering algorithms

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Generation Expansion Planning Considering Load Growth Uncertainty: A Novel Multi-Period Stochastic Model

Abstract – Distributed generation (DG) technology is known as an efficient solution for applying in distribution system planning (DSP) problems. Load growth uncertainty associated with distribution network is a significant source of uncertainty which highly affects optimal management of DGs. In order to handle this problem, a novel model is proposed in this paper based on DG solution, consideri...

متن کامل

A Comparative Study of Issues in Big Data Clustering Algorithm with Constraint Based Genetic Algorithm for Associative Clustering

Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups, called clusters. The growing need for distributed clustering algorithms is attributed to the huge size of databases that is common nowadays. The task of extracting knowledge from large databases, in the form of clustering rules, has attracted considerable attention. Distri...

متن کامل

Static Task Allocation in Distributed Systems Using Parallel Genetic Algorithm

Over the past two decades, PC speeds have increased from a few instructions per second to several million instructions per second. The tremendous speed of today's networks as well as the increasing need for high-performance systems has made researchers interested in parallel and distributed computing. The rapid growth of distributed systems has led to a variety of problems. Task allocation is a...

متن کامل

Distributed Graph Clustering and Sparsification

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation, and cannot be d...

متن کامل

Presenting a Novel Algorithm to Optimal Designing Power Distribution Network in the Presence of DG

Regarding the nature of non-linear discrete placement, and in order to determine the optimal capacity of the substation, the goal in the present study will be a number of local optimum points. In this research, the problem of optimal placement posts to reduce power losses by considering Distributed Generation (DG). In formulating the objective function, geographical distribution density in the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016